Methods to Reduce I/O for Decision Tree Classifiers
نویسندگان
چکیده
Classification is an important data mining problem. Although datasets can be quite large in data mining applications, it can be advantageous to use the entire training dataset as opposed to sampling since that can increase accuracy. I/O is a significant component of overall execution time in many decision tree classifiers. We present some new optimizations that work with many of these classifiers on both sequential and parallel processors. For ease of explanation, we describe these optimizations mostly in the context of SPRINT, a classifier developed recently for large problems where the training datasets may be disk resident.
منابع مشابه
MMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملA novel hybrid method for vocal fold pathology diagnosis based on russian language
In this paper, first, an initial feature vector for vocal fold pathology diagnosis is proposed. Then, for optimizing the initial feature vector, a genetic algorithm is proposed. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different classifiers (ensemble of decision tree, discriminant analysis and K-nearest neig...
متن کاملCost Complexity Pruning of Ensemble Classifiers
In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the means to efficiently scale learning to large datasets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system...
متن کاملComparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملPerformance Evaluation of Decision Tree Classifiers on Medical Datasets
In data mining, classification is one o f the significant techniques with applications in fraud detection, Artificial intelligence, Medical Diagnosis and many other fields. Classification of objects based on their features into predefined categories is a widely studied problem. Decision trees are very much useful to diagnose a patient problem by the physicians. Decision tree classifiers are use...
متن کامل